T11 - Activation Function - RELU,LEAKY RELU,ELU,P-RELU,Softmax,SWISH,Softplus

Activation Function

  1. Sigmoid
  2. Threshold Activation Function - TANH
  3. RELU - Rectifier Linear Unit
  4. LEAKY RELU
  5. ELU (Exponential Linear Unit)
  6. PRELU
  7. SWISH
  8. Softplus
  9. Softmax

1. Sigmoid Activation Function

In Sigmoid Activation Function we get value between 0 - 1

Derivative ranges between 0 - 0.25

image.png

2. Threshold Activation function - TANH

It gives value ranging beween -1 to 1

Derivative of tanh ranges between 0 to 1

image.png

In Sigmoid and Tanh - Vanishing Gradient Problem rises , Therefore to avoid that we use RELU

3. RELU Activation Function

Its value is given by max(z,0)

Derivative of RELU -- Z>0 = 1 , Z<0 = 0

image.png

RELU is mostly used in Hidden Layer because it solves Vanishing Gradient problem.

Now there is a problem in RELU --- When we subsitute value as 0 (z<0 by Derivative of Relu) in Chain Rule --Weights updation there is no change in New and old weights . To overcome that problem we use LEAKY RELU

4. LEAKY RELU Activation Function

In Leaky RELU , Small constant is multiplied with x.So that value never be 0 in Derivative and there will be some change in new and old weights value when we mulitply it be constant as used below - 0.01

Smaller lighter value is preresent when z < 0 in Derivative of LEAKY RELU

image.png

There is probelm in LEAKY RELU

When z < 0 == Instead of 0 we will subsitute it as 0.01 (In derivative of LEAKY RELU formula - 0.01 is used for 0).So these will be very very small value .But when you substitute it in Main Weights updation formula --- There will be no change in old and new weights which leads to Vanishing gradient Problem

image.png

5. ELU ( Exponential Linear Unit ) function

To overcome problems of Leaky RELU --- ELU activation funcion is used.

Whenever z/x value is greater than 0 -- max (0,x) , But when x/z value is < 0 , Then we will handle negative value in efficient way , Alpha is Learning Parameter - Hyper parameter

So whenever we find Derivative of these function , we will get value in below structure . It is handling some range of earlier negative value with some greater value which keeps on decreases when negative value increases

image.png

Only disadvantage of these activation function is that it takes more time as compared to RELU , LEAKY RELU etc

6. P-RELU - Parametric RELU

It is similar to RELU

Z > 0 --- MAX(Z,0) -- Z \ Z < 0 --- Alpha * Z

If your Alpha value is 0.01 ----Then it will become Leaky RELU
If your Alpha value is 0 -------- RELU

Alpha is parametric -- Learning Parameter -- Hyper parameter

image.png

image.png

7. SWISH

Formula --- y = z * sigmoid (z) ----- These is called as SELF GATING --- Mostly used in LSTM .

These is computationally expensive

z = Summation of weights*inputs + bias

You should only use these when you have more than 40 LAYERS in NEURAL NETWORK

It solves Dead Activation function which we face in RELU

image.png

image.png

8. SOFT-PLUS Activation Function'

Log is used to Handle the negative values i-e when z < 0 , Positive values are also handled over here

So instead of applying max(0,x) . It will use these

image.png

Green Line - Softplus

Blue Line - RELU

image.png

9. Softmax Activation Function

Sigmoid Actiation Function -- Used when we have Binary classification

60 % it is saying DOG , 40 % it is saying cat .

Now what we do is when value is > 50 --- We will consider that and that will be o/p

image.png

Softmax Activation function is basically used when we have many categories to be classified in output layer \ Suppose we have following Neural Network Architecture.

Whenever we have more than 2 ouput categories we use SOFTMAX

If we pass Image , we want to see - whether it belongs to cat,dog,monkey,rat

We are softmax -- For each we will see output

image.png

We use these formula

x in below figure is weights*inputs + bias

image.png

Now Assume before applying Softmax function we get these values

[40,30,10,5]

Now when we apply softmax formula we will get the final value as shown below

So we substitute xj value from 1 to 4 index of [40,30,10,5]

First 40 substituted we get 0.61 , when we substitute we get 0.352 and so on for 10 and 5

image.png

Final value will be the Highest value -- 0.61

Sigmoid and Softmax are always kept in LAST i-e OUTPUT LAYER

In [ ]:
 
In [ ]: